24 research outputs found

    Optimising gene expression profiling using RNA-seq

    Get PDF

    A systematic evaluation of single cell RNA-seq analysis pipelines

    Get PDF
    The recent rapid spread of single cell RNA sequencing (scRNA-seq) methods has created a large variety of experimental and computational pipelines for which best practices have not yet been established. Here, we use simulations based on five scRNA-seq library protocols in combination with nine realistic differential expression (DE) setups to systematically evaluate three mapping, four imputation, seven normalisation and four differential expression testing approaches resulting in similar to 3000 pipelines, allowing us to also assess interactions among pipeline steps. We find that choices of normalisation and library preparation protocols have the biggest impact on scRNA-seq analyses. Specifically, we find that library preparation determines the ability to detect symmetric expression differences, while normalisation dominates pipeline performance in asymmetric DE-setups. Finally, we illustrate the importance of informed choices by showing that a good scRNA-seq pipeline can have the same impact on detecting a biological signal as quadrupling the sample size

    The impact of amplification on differential expression analyses by RNA-seq

    Get PDF
    Currently, quantitative RNA-seq methods are pushed to work with increasingly small starting amounts of RNA that require amplification. However, it is unclear how much noise or bias amplification introduces and how this affects precision and accuracy of RNA quantification. To assess the effects of amplification, reads that originated from the same RNA molecule (PCR-duplicates) need to be identified. Computationally, read duplicates are defined by their mapping position, which does not distinguish PCR-from natural duplicates and hence it is unclear how to treat duplicated reads. Here, we generate and analyse RNA-seq data sets prepared using three different protocols (Smart-Seq, TruSeq and UMI-seq). We find that a large fraction of computationally identified read duplicates are not PCR duplicates and can be explained by sampling and fragmentation bias. Consequently, the computational removal of duplicates does improve neither accuracy nor precision and can actually worsen the power and the False Discovery Rate (FDR) for differential gene expression. Even when duplicates are experimentally identified by unique molecular identifiers (UMIs), power and FDR are only mildly improved. However, the pooling of samples as made possible by the early barcoding of the UMI-protocol leads to an appreciable increase in the power to detect differentially expressed genes

    zUMIs - A fast and flexible pipeline to process RNA sequencing data with UMIs

    Get PDF
    Background: Single-cell RNA-sequencing (scRNA-seq) experiments typically analyze hundreds or thousands of cells after amplification of the cDNA. The high throughput is made possible by the early introduction of sample-specific bar codes (BCs), and the amplification bias is alleviated by unique molecular identifiers (UMIs). Thus, the ideal analysis pipeline for scRNA-seq data needs to efficiently tabulate reads according to both BC and UMI. Findings: zUMIs is a pipeline that can handle both known and random BCs and also efficiently collapse UMIs, either just for exon mapping reads or for both exon and intron mapping reads. If BC annotation is missing, zUMIs can accurately detect intact cells from the distribution of sequencing reads. Another unique feature of zUMIs is the adaptive downsampling function that facilitates dealing with hugely varying library sizes but also allows the user to evaluate whether the library has been sequenced to saturation. To illustrate the utility of zUMIs, we analyzed a single-nucleus RNA-seq dataset and show that more than 35% of all reads map to introns. Also, we show that these intronic reads are informative about expression levels, significantly increasing the number of detected genes and improving the cluster resolution. Conclusions: zUMIs flexibility makes if possible to accommodate data generated with any of the major scRNA-seq protocols that use BCs and UMIs and is the most feature-rich, fast, and user-friendly pipeline to process such scRNA-seq data

    Sensitive and powerful single-cell RNA sequencing using mcSCRB-seq

    Get PDF
    Single-cell RNA sequencing (scRNA-seq) has emerged as a central genome-wide method to characterize cellular identities and processes. Consequently, improving its sensitivity, flexibility, and cost-efficiency can advance many research questions. Among the flexible platebased methods, single-cell RNA barcoding and sequencing (SCRB-seq) is highly sensitive and efficient. Here, we systematically evaluate experimental conditions of this protocol and find that adding polyethylene glycol considerably increases sensitivity by enhancing cDNA synthesis. Furthermore, using Terra polymerase increases efficiency due to a more even cDNA amplification that requires less sequencing of libraries. We combined these and other improvements to develop a scRNA-seq library protocol we call molecular crowding SCRB-seq (mcSCRB-seq), which we show to be one of the most sensitive, efficient, and flexible scRNA-seq methods to date

    Evolutionary routes and KRAS dosage define pancreatic cancer phenotypes.

    Get PDF
    The poor correlation of mutational landscapes with phenotypes limits our understanding of the pathogenesis and metastasis of pancreatic ductal adenocarcinoma (PDAC). Here we show that oncogenic dosage-variation has a critical role in PDAC biology and phenotypic diversification. We find an increase in gene dosage of mutant KRAS in human PDAC precursors, which drives both early tumorigenesis and metastasis and thus rationalizes early PDAC dissemination. To overcome the limitations posed to gene dosage studies by the stromal richness of PDAC, we have developed large cell culture resources of metastatic mouse PDAC. Integration of cell culture genomes, transcriptomes and tumour phenotypes with functional studies and human data reveals additional widespread effects of oncogenic dosage variation on cell morphology and plasticity, histopathology and clinical outcome, with the highest KrasMUTlevels underlying aggressive undifferentiated phenotypes. We also identify alternative oncogenic gains (Myc, Yap1 or Nfkb2), which collaborate with heterozygous KrasMUTin driving tumorigenesis, but have lower metastatic potential. Mechanistically, different oncogenic gains and dosages evolve along distinct evolutionary routes, licensed by defined allelic states and/or combinations of hallmark tumour suppressor alterations (Cdkn2a, Trp53, Tgfβ-pathway). Thus, evolutionary constraints and contingencies direct oncogenic dosage gain and variation along defined routes to drive the early progression of PDAC and shape its downstream biology. Our study uncovers universal principles of Ras-driven oncogenesis that have potential relevance beyond pancreatic cancer.The work was supported by the German Cancer Consortium Joint Funding Program, the Helmholtz Gemeinschaft (PCCC Consortium), the German Research Foundation (SFB1243; A13/A14) and the European Research Council (ERC CoG number 648521)

    The impact of amplification on differential expression analyses by RNA-seq

    Get PDF
    Currently, quantitative RNA-seq methods are pushed to work with increasingly small starting amounts of RNA that require amplification. However, it is unclear how much noise or bias amplification introduces and how this affects precision and accuracy of RNA quantification. To assess the effects of amplification, reads that originated from the same RNA molecule (PCR-duplicates) need to be identified. Computationally, read duplicates are defined by their mapping position, which does not distinguish PCR-from natural duplicates and hence it is unclear how to treat duplicated reads. Here, we generate and analyse RNA-seq data sets prepared using three different protocols (Smart-Seq, TruSeq and UMI-seq). We find that a large fraction of computationally identified read duplicates are not PCR duplicates and can be explained by sampling and fragmentation bias. Consequently, the computational removal of duplicates does improve neither accuracy nor precision and can actually worsen the power and the False Discovery Rate (FDR) for differential gene expression. Even when duplicates are experimentally identified by unique molecular identifiers (UMIs), power and FDR are only mildly improved. However, the pooling of samples as made possible by the early barcoding of the UMI-protocol leads to an appreciable increase in the power to detect differentially expressed genes

    Ageing and sources of transcriptional heterogeneity

    No full text
    Cellular heterogeneity is an important contributor to biological function and is employed by cells, tissues and organisms to adapt, compensate, respond, defend and/or regulate specific processes. Research over the last decades has revealed that transcriptional noise is a major driver for cell-to-cell variability. In this review we will discuss sources of transcriptional variability, in particular bursting of gene expression and how it could contribute to cellular states and fate decisions. We will highlight recent developments in single cell sequencing technologies that make it possible to address cellular heterogeneity in unprecedented detail. Finally, we will review recent literature, in which these new technologies are harnessed to address pressing questions in the field of ageing research, such as transcriptional noise and cellular heterogeneity in the course of ageing

    Ageing and sources of transcriptional heterogeneity

    No full text
    corecore